Rapid Assessment of Extremal Statistics for Gapped Local Alignment
نویسندگان
چکیده
The statistical significance of gapped local alignments is characterized by analyzing the extremal statistics of the scores obtained from the alignment of random amino acid sequences. By identifying a complete set of linked clusters, "islands," we devise a method which accurately predicts the extremal score statistics by using only one to a few pairwise alignments. The success of our method relies crucially on the link between the statistics of island scores and extremal score statistics. This link is motivated by heuristic arguments, and firmly established by extensive numerical simulations for a variety of scoring parameter settings and sequence lengths. Our approach is several orders of magnitude faster than the widely used shuffling method, since island counting is trivially incorporated into the basic Smith-Waterman alignment algorithm with minimal computational cost, and all islands are counted in a single alignment. The availability of a rapid and accurate significance estimation method gives one the flexibility to fine tune scoring parameters to detect weakly homologous sequences and obtain optimal alignment fidelity.
منابع مشابه
Statistical significance and extremal ensemble of gapped local hybrid alignment
A “semi-probabilistic” alignment algorithm which combines ideas from Smith-Waterman and probabilistic alignment is proposed and studied in detail. It is predicted that the score statistics of this “hybrid” algorithm is of the universal Gumbel form, with the key Gumbel parameter λ taking on a fixed asymptotic value for a wide variety of scoring parameters. We have also characterized the “extrema...
متن کاملLocal Gapped Subforest Alignment and Its Application in Finding RNA Structural Motifs
RNA molecules whose secondary structures contain similar substructures often have similar functions. Therefore, an important task in the study of RNA is to develop methods for discovering substructures in RNA secondary structures that occur frequently (also referred to as motifs). In this paper, we consider the problem of computing an optimal local alignment of two given labeled ordered forests...
متن کاملEstimating the Gumbel Scale Parameter for Local Alignment of Random Sequences by Importance Sampling with Stopping Times.
The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases over the web. Accordingly, this article gives a novel equation for the scale parameter of the relev...
متن کاملGapped Extension for Local Multiple Alignment of Interspersed DNA Repeats
The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered poor scalability and limited accurac...
متن کاملA General Framework for Local Pairwise Alignment Statistics with Gaps
We present a novel dynamic programming framework that allows one to compute tight upper bounds for the p-values of gapped local alignments in pseudo–polynomial time. Our algorithms are fast and simple and unlike most earlier solutions, require no curve fitting by sampling. Moreover, our new methods do not suffer from the so–called edge effects, a by–product of the common practice used to comput...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره شماره
صفحات -
تاریخ انتشار 1999